# Play the chunk above and this one to get the data into your Console
View(Friendly)
?Friendly
Many teachers and other educators are interested in understanding how to best deliver new content to students. In general, they have two choices of how to do this.
A study was performed to determine whether the Meshed or Before approaches to delivering content had any positive benefits on memory recall.
I want to test if there is any difference between results when taking the time to split the recalled and non-recalled words exactly one after another (Meshed) or just randomly repeating all words back(Standard Free Recall, SFR). Thus I will be comparing the SFR and Meshed Data. Because of the small trial (sample) sizes I will be using a Wilcoxon Rank Sum (Mann-Whitney) Test.
In my test the Null and alternative are as shown below, reasoning for these hypotheses is provided in more detail below: \[ H_0: \text{the distributions are stochastically equal} \]
\[ H_a: \text{one distribution is stochastically greater than the other} \] We will use a significance level of 0.05.
Friendly2 <- Friendly %>%
filter(Friendly$condition== "Meshed" | Friendly$condition=="SFR")
F3 <- Friendly2[-c(11,12,13,14,15,16,17,18,19,20)]
rownames(F3) <- 1:20
F4 <- F3 %>% drop_na()
p2 <- ggplot(F4, aes(x=condition, y=correct,fill = condition))+
geom_boxplot(outlier.shape = NA)+
labs(title="Does randomization affect recall? (Interactive)", x="Condition", y="Number of words recalled")+
stat_summary(fun=mean, geom="point", shape=4, size=4, color="red", fill="red") +
coord_cartesian(ylim = c(20, 45))+
theme(plot.title = element_text(hjust = 0.5))
ggplotly(p2)
This boxplot helps the illustrate the reasoning for the above hypotheses. The spreads appear relatively similar but to visualize each groups results from another perspective I decided to use a dot plot as well.
ggplot(F4, aes(x=factor(condition), y=correct, fill = factor(condition)))+
coord_cartesian(ylim = c(0, 45))+
coord_flip( ) +
geom_dotplot(binaxis = "y", stackdir = "up", position = "dodge", dotsize = 0.75, binwidth = 0.5)+
labs(title="Does randomization affect recall?", x="Condition", y="Correct", legend="Legend")
Within this dot plot we can see that the shape of these two conditions are very similar except that the Standard Free Recall method seems to have been pulled, or stretched out over a wider area.
The values below are the same data as those shown above in the box plot, just in tabular form.
F4 %>%
group_by(condition) %>%
summarise(min = min(correct), median = median(correct), mean = mean(correct), max = max(correct), sd = sd(correct), `Number of Observations` = n()) %>%
pander(caption="Summary of Standard Free Recall vs Meshed")
| condition | min | median | mean | max | sd | Number of Observations |
|---|---|---|---|---|---|---|
| Meshed | 30 | 36.5 | 36.6 | 40 | 3.026 | 10 |
| SFR | 21 | 27 | 30.3 | 39 | 7.334 | 10 |
As a result of the small samples Wilcoxon rank Sum test was the preferred method to test the hypotheses. The Box plots allow us to primarily visualize the greater standard deviation within the SFR group, whereas the dot plot allows for a better idea of each samples true distribution. Given the small sample sizes it is difficult to determine how accurately these plots reflect each population as whole. It is for these reasons the current Hypotheses where used as they do not require that the distributions be identically distributed.
wilcox.test(F4$correct[F4$condition == "Meshed"],
F4$correct[F4$condition == "SFR"], mu = 0, alternative = "two.sided", conf.level = 0.95, conf.int = TRUE) %>%
pander(caption="Wilcoxon test for Meshed vs. SFR")
| Test statistic | P value | Alternative hypothesis | difference in location |
|---|---|---|---|
| 72 | 0.1015 | two.sided | 8.114 |
Although our data reports a few ties in the analysis they are not so prevalent that they require us to negate the test results. Given these results we would conclude with an alpha of 0.05 that we fail to reject the null, the two methods are stochastically equal.Our current test shows that within our samples the Meshed condition observations remembered roughly 8 more words than the SFR condition observations. I feel given the closeness of the results additional testing on this subject would be warranted. Although current results show their may be reasonable evidence of a possible difference, showing that 50/50 split of old and new concepts is more effective than random access to the information, with these samples we do not have sufficient evidence at the 95% confidence level to reject the null hypothesis that the two populations are stochastically equal.